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Abstract 

With the discovery of new exploit techniques, new 
protection mechanisms are needed as well. Mit- 
igations like DEP (Data Execution Prevention) 
or ASLR (Address Space Layout Randomization) 
created a significantly more difficult environment 
for vulnerability exploitation. Attackers, however, 
have recently developed new exploitation methods 
which are capable of bypassing the operating sys- 
tem's security protection mechanisms. 

In this paper we present a short summary of 
novel and known mitigation techniques against 
return-oriented programming (ROP) attacks. The 
techniques described in this article are related 
mostly to x86-3^ processors and Microsoft Win- 
dows operating systems. 



1 Introduction 

In order to increase the security level of the op- 
erating system, Microsoft has implemented several 
mitigation mechanisms, such as DEP and ASLR. 
Data Execution Prevention (DEP) is a security fea- 
ture that prohibits the application from executing 
code from non-executable memory area. To ex- 
ploit a vulnerability, an attacker must find a ex- 
ecutable memory region and be able to fill it with 
necessary data (e.g., shellcode instructions). Gen- 
erally, achieving this goal using old exploitation 
techniques is made significantly more difficult with 
the addition of the DEP mechanism. As a result, 
attackers improved upon the classic "return-into- 



'^Some of the techniques can be also appHed on other ar- 
chitectures, albeit some of them are only available for x86-32 
family (e.g., the ones based on creating new segment descrip- 
tors). 



libc" technique and started using return-oriented 
programming (ROP) [21 [7] to bypass Data Execu- 
tion Prevention. 

Techniques like ROP are still based on the at- 
tacker understanding memory layout characteris- 
tics, leading Microsoft to implement Address Space 
Layout Randomization (ASLR) as a countermea- 
sure. ASLR renders the layout of an application's 
address space less predictable because it relocates 
the base addresses of executable modules and other 
memory mappings. In order to bypass DEP protec- 
tion mechanism ROP technique was introduced. In 
this article we present novel and known mechanisms 
which are created specifically to prevent attackers 
from exploiting vulnerabilities based on the ROP 
method. Presented mitigations will be divided in 
two general categories: 

• Compiler-level mitigations — mitigations that 
can be only applied by the compiler or linker. 

• Binary-level mitigations — mitigations that 
can be applied without knowing the source 
code of the protected code fragment. 

2 Return-oriented Program- 
ming 

Return-oriented programming is a known exploita- 
tion technique which allows the attacker to use 
stack memory to indirectly execute previously 
picked instructions (so called gadgets). Typically 
each gadget ends with the x86 subroutine return 
instructiorH (RET), which further transfers the ex- 
ecution to the next gadget or the payload itself. 



^However other instructions may be used as well like jmp 
reg, call reg etc. 
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For more information regarding the return-oriented 
programming technique please refer to [TJ [31 [7] . 

3 Compiler- level mitigations 

In this section we present ROP protection mecha- 
nisms which can be applied at the compiler-level. 
However this doesn't mean they arc not imple- 
mentablc at the binary-level - they are simply sub- 
stantially easier to implement at the compiler-level. 
We will also try to underline advantages and dis- 
advantages of described mechanisms. 

The biggest disadvantage of compiler-level miti- 
gations is the fact that they require code recompi- 
lation in order to be effective. It is often hard to 
quickly implement such kind of changes in the real 
world. 

3.1 Call- Ret relations 

As previously stated, most gadgets use return 
instructions to transfer execution control to an- 
other gadget or payload. In order to find useful 
gadgets, attackers scan the process memory or the 
binary module for return instruction opcodes and, 
after such opcode is found, they try to perform 
backward disassembly in order to decide whether 
following gadget is useful (correct) or not. Return 
instruction opcodes can often be found in the 
middle of different instructions. Results, how- 
ever, show that most of the time original return 
instructions RET are used. Typically they also 
represent the highest number of return opcodes 
found in the entire module's executable area 
(cf . Figure [T]) . For the remainder of this article 
RET instructions emitted in the original program's 
code will be named as "original return instruction". 



3.1.1 Testing for CALLs 

In typical applications, every procedure (function) 
is executed by using call-procedure instruction. Ev- 
ery CALL instruction saves procedure linking infor- 
mation on the stack and branches to the procedure 
specified by the destination operand. Our ROP 
mitigation technique relies on a fact that each re- 
turn address popped from the stack by the RET in- 
struction is preceded by CALL instruction. When 



Figure 1: RET opcode offsets in sample modules 
(offset equal to 1 indicates that this is an original 
RET instruction). 
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a ROP attack occurs the return address points to 
another gadget (or finally a payload) . It is unlikely 
that an attacker will be able to pick the return ad- 
dresses preceded by CALL instruction operands (see 
ITable II for details) . Testing for CALL instructions 
located before the return address popped from the 
stack should be a reliable method against ROP at- 
tacks. 



Module Name 


Ni [#] 


N2 [#] 


ntdll.dll 
ief rame . dll 
bib.dll 
aswEngin.dll 


6528 
45232 
5966 
50895 


138 (2.11%) 
2109 (4.66%) 

317 (5.31%) 
1547 (3.03%) 



Table 1: Number of gadgets preceded by rela- 
tive, memory indirect, register indirect procedure- 
call instruction ("minimal/not extended" address- 
ing mode assumed). 

Where: 

• A^i represents total number of gadgets 

• N2 represents number of gadgets preceded by 
the procedure-call instruction 

• gadget represents a valid single instruction or 
sequence of instructions without any special 
filtering applied regarding the gadget useful- 
ness 

However, the method itself has some drawbacks. 
CALL instructions can be encoded in various ways 
(relative, absolute, indirect), which can influence 
the scanner's performance and also the potential 
reliability of this method. Secondly, only origi- 
nal return instructions can be protected. In other 
words, using different instructions (like indirect 
jumps or calls) for linking gadgets will be not 
detected. On the other hand, the CALL opcode 
checking method can be based on opcode-frequency 
statistics, which could decrease the potential per- 
formance slowdown. Additionally, since only spec- 
ified (valuable for attacker) return instructions can 
be protected, this should have a positive influence 
for the program's performance. 

3.1.2 Emitting magic values 

This method was introduced by the Pax Team [S] 
and it relies on emitting magic bytes after every 



callee 




epilogue 






mov r egi St er , [ esp] 




cmp [register +1] , MAGIC 




jnz .1 




r etn 


.1: jmp 


esp 


caller : 






call callee 




test eax, MAGIC 



Listing 1 : Protection of the execution flow changes 
via the return instructions. 



CALL instruction and testing them at the function 
epilogue, as shown in Listing [1] 

This method seems to be more reliable than the 
method described in Section 13.1.11 although it also 
has some major drawbacks. First of all, the TEST 
instruction isn't neutral for the application con- 
text's state, since the EFLAGS register is modified by 
this instruction. This fla"v\lf|, however, can be easily 
fixed by simply emitting JMP OVER_MAGIC instruc- 
tion after each CALL. A more serious limitation of 
this method is the fact that every module used by 
the application would have to be created (compiled 
and linked) with the same MAGIC value. This is 
necessary since execution transfers may occur from 
one module to anotheiQ. 

Since this approach would be almost impossible 
to implement in the real world there is another so- 
lution which can be used here. We propose that 
Windows' Portable Executable loader be responsi- 
ble for synchronizing every MAGIC value after each 
system boot (and after specified module is loaded) . 
This would of course require creating a new section 
(or some new, specific data directory) with all the 
MAGIC values offsets that should be updated by the 
executable loader. 

3.2 Obfuscating instructions 

This approach addresses the problem where the RET 
instruction opcode is a part of different instruction 
(typically it is located among the first 1-3 bytes. 



^Whether this is a flaw or not depends mostly on the 
application binary interface; in most cases the caller is re- 
sponsible for saving the flags. 

* Modules that don't perform execution transfers to other 
modules can be left "unsynchronized". 
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not including instruction opcode). Owing to our 
tests and external sources [7] most of such opcodes 
are found in the ModR/M byte. A second large 
source of RET opcodes is found in immediate dis- 
placements. In order to prevent from effectively 
using such cases in the ROP attack we propose 
that every instruction with RET opcode inside of 
its body will be obfuscated in a special manner. 
Of course control transfer instructions or any other 
instructions that use immediate data offsets arc an 
exception to this rule since the immediate displace- 
ments are calculated by the linker. The potential 
obfuscation can be done in following fashion: 

• If RET opcode is found in the first byte after 
the original instruction opcode, a jump land 
should be emitted just before this instruction. 
Such jump land should consist of a short un- 
conditional jump instruction and a land (up 
to 16 bytes) of I NTS or other worthless for at- 
tacker single byte instructions. Such emitted 
instructions will never be executed by the orig- 
inal program flow because of the unconditional 
jump, which transfers the execution directly 
to the potentially dangerous instruction. Such 
action should decrease the number of effective 
gadgets used for creating the ROP chain. 

• If RET opcode is spotted in immediate constant 
values such instruction should be obfuscated 
for example by splitting ADD REG,IMM32 into 
two ADD instructions where the IMM32 operand 
for both of them would be free of return in- 
struction opcodes. Of course special care must 
be taken regarding the EFLAGS register state 
after each such transition. 

• If RET opcode is found in ModR/M byte, 
which indicates using EAX register as desti- 
nation operand and EBX register as source 
operand (e.g., MOV EAX, EBX), such instructions 
can be transformed into equivalent form which 
doesn't include return instruction opcodes. 
For example MOV EAX , EBX O PUSH EBX ; POP 
EAX (0x53 0x58). 

As previously mentioned, the presented solutions 
can only be applied to instructions that do not use 
immediate displacements, as those are handled by 
the linker. 



1 : mov esp , eax 
ret 

2 : xchg eax , e sp 
ret 

3: add esp , <number > 
ret 



Listing 2: Typical stack pivot sequences. 



4 Binary- level mitigations 

In this section we present mitigations against ROP 
attacks that can be applied without any informa- 
tion of the program's original source code. All miti- 
gations included in this section can be implemented 
at the binary-level. 

4.1 Stack Encapsulation 

To make a ROP attack work, the attacker must be 
able to point the stack pointer into the controlled 
data. In typical stack-buffer overflow vulnerabili- 
ties this is not needed, but in other vulnerabilities 
(e.g., heap-overflow) this is often a must. In order 
to achieve this goal, the attackers use the so called 
stack pivot sequence [T]. Listing[2]shows some com- 
monly used stack pivot sequences. Our mechanism 
tries to take advantage of this information. 

When a new thread is created, operating systems 
reserve some necessary space for its stack memory. 
Stack borders are described in the INITIAL_TEB 
structure which is passed in one of parameter of 
NtCreateThread function. Additionally stack bor- 
ders are also available in the Thread Information 
Block (FS : [0x04] - top stack, FS : [0x08] - current 
bottom stack). When the attacker uses the pivot 
sequence he typically exceeds the stack border lim- 
its set by the thread initialization procedure. The 
methods described in the following sections were 
designed to recognize this behavior. Similar sup- 
port must be taken when dealing with fibers, since 
they also use separate stacks. 

4.1.1 New stack segment descriptor 

Microsoft Windows systems allow usermode appli- 
cations to create their own local descriptor table 
(LDT). Most current operating systems use the flat 
memory model, where there is no need to create 
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xor eax , eax 

lea edi , [esp+VALUE] 

stosd 

stosd 



Listing 3: Typical program instructions. 



additional segments for every running application. 
This would be in fact a step back to the old seg- 
mented memory model. On Windows platforms, in 
usermode, all segments' base addresses are equal to 
zero, except the one pointed by the FS register (the 
GS segment register is not usecJI). In our mitiga- 
tion mechanism we have developed two approaches 
that protect the system against the stack pivoting 
technique. Our initial technique was to create a 
stack segment descriptor each time new a thread 
is created with a base address equal to the stack 
bottom and limit corresponding to stack size. Af- 
ter the new segment is created we initialize the SS 
segment register with a new value. 

This method however has a big drawback, which 
is explained on the listing below (Listing [3]) . 

The LEA instruction is responsible for initializ- 
ing the EDI register with the effective address of 
ESP+VALUE. However, the value that will be stored 
in the EDI register is still relative to the stack seg- 
ment base address (which is not null in our case). 
The problems start with instructions that don't use 
the SS segment register for addressing purposes. 
For example, the STOSD instruction uses the ES seg- 
ment register; its execution will end with an ac- 
cess violation, since the base address of the seg- 
ment pointed by ES segment register is different. 
In other words the LEA instruction does not honor 
the segment registers when calculating the effective 
address. 

To resolve this issue we were forced to change the 
base address of the newly created stack segments. 
To avoid unnecessary access violations, the stack 
segment base address was set to zero and its limit 
was set to the stack's top value. This has some dis- 
advantages, since the attacker would need to initial- 
ize new stack pointer value with an address higher 
than the segment limit to trigger the mechanism. 
Most of the time, however, newly allocated buffers 
have higher addresses since the thread's stack mem- 

^This is true for x86-32 architectures only 



ory allocation was done earlier (there are a few ob- 
vious exceptions to this rule). Each time the at- 
tacker tries to exceed the boundaries of the current 
stack segment a general protection fault occurs and, 
at this point, our filtering procedure decides if the 
selected process is being exploited and needs to be 
terminated. 

As a side note, there is one small problem with 
this method. Instructions that use the EBP regis- 
ter for memory addressing are also using the stack 
segment specified by the current segment selector. 
This means that if the EBP is not related to the 
stack memory and the destination address exceeds 
the stack segment boundaries a general protection 
fault will occur. Such cases however can be eas- 
ily filtered and the execution can be resumed after 
emulating the faulting instruction. 

Countermeasures In order to bypass the stack 
encapsulation protection, an attacker would need 
to initialize the stack pointer with a lower memory 
address than the stack's top value. For example at- 
tacker can heap-spray the memory and then cause 
the application to create a new thread that will be 
used to trigger the vulnerability. By doing this at- 
tacker fake stack will be below the stack base. An- 
other way would be to execute a gadget that reini- 
tializes the stack segment with the original value 
(constant between Windows versions) by, for ex- 
ample, executing a POP SS instruction. To disable 
this attack we are constantly monitoring the value 
of the SS segment register, and we reinitialize it ev- 
ery time execution returns from a system call (since 
kernel reinitializes the segment registers values be- 
fore the control is returned to the usermode). 

4.1.2 Monitoring stack pointer changes 

Another approach for detecting the stack pivoting 
technique is to monitor the stack pointer value at 
crucial areas. For example, instead of setting an- 
other segment for stack space we can hook impor- 
tant offensive API functions (e.g., VirtualAlloc, 
VirtualProtect) and test the stack pointer value 
there. Obviously, there is no guarantee that the 
attacker wouldn't be able to restore the original 
stack pointer before using such API functions. To 
improve the security level of this protection mech- 
anism we also propose that newly allocated mem- 
ory regions (or memory regions with changed page 
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protection rights) with executable pages should be 
marked as iion-executabl^l. Now the page marked 
as non-executable by our mechanism will work as a 
decoy. If the processor is trying to execute the non- 
executable page (page protection was previously 
changed by our mechanism) then we firstly apply 
our filtering procedure which tests the stack pointer 
value. If everything is correct, the executable rights 
are re-enabled and the execution is continued — the 
entire mechanism works like a one-time decoy. 



BEFORE 

(SAME CODE SEGMENT) 



AFTER 

(NEW CODE SEGMENTS) 



APP,eXE 



KERNEL32.DLL 
NTDLL.DLL 



App.exe 



KeRNEL3Z.DLL 



NTDLL.DLL 



4.2 Code Encapsulation 

The ROP technique, just like any other, has some 
strategic points. One of those is the fact that the at- 
tacker must know the virtual addresses of the used 
gadgets. Because so, the ASLR mechanism success- 
fully obstructs the exploitation process. However, 
in some cases the attacked application is either not 
compatible with ASLR or just uses external mod- 
ules which do not support the ASLR mechanism. 
There are also cases where the attacker is able to 
leak or guess the wanted virtual address, rendering 
the ASLR mechanism relatively easy to bypass. In 
this section we present a mechanism which will take 
advantage of this (ROP) technique's weak spot. 

As stated in Section [4. 1.11 Windows systems al- 
low usermode applications to create their own lo- 
cal descriptor tables. In this mechanism we pro- 
pose that each loaded module's code in the appli- 
cation's address space (including the main mod- 
ule) will have a separate segment for code sectiontQ, 
as [Figure 2| shows. Each time a execution transfer 
between modules or execution transfer using a full 
virtual address (including module imagebase value) 
occurs, a general protection fault will happen. At 
this point the filtering procedure decides whether 
this execution transfer attempt is valid or an attack 
attempt. 

This method has some drawbacks: 

• A lot of control transfers are done through API 
calls and since they require a code segment 
switch a general protection fault is thrown ev- 
ery time such action occurs. Since this has 
a negative impact on the application's per- 
formance, entire import address table entries 

^this requires having a CPU with NX bit support 

^this mechanism is a bit similar to PaX SEGMEXEC [6] 



Figure 2: Not-encapsulated and encapsulated mod- 
ules inside of the process memory. 

should be redirected to specific API stubs as 
shown in Listing [l] 

This solution should successfully decrease the 
negative performance impact because of the 
decreased number of GP faults. However, this 
is only one aspect of the problem, since the re- 
quested API must be able to return correctly 
to the specified location which is outside the 
current code segment. There are a few ways 
to solve this issue. One of the potential so- 
lutions can be based on faking the return ad- 
dress in the API land stub and then recalculat- 
ing the correct address when RET instructions 
cause the GP fault. Every potential solution 
here, however, will decrease the program per- 
formance. Additionally, since API lands can 
be only generated either before the base ad- 
dress of the specified module or attached to 
the end of it the code segment borders need 
to be expanded as well (at least in cases that 
don't overwrite module's memory). 

• Special care must be taken when dealing with 
case-switch offsets since they also contain vir- 
tual addresses that don't apply to the new 
code segment limits. This issue can be par- 
tially resolved with using module relocation 
information and applying some heuristic scan- 
ning mechanism. All case-switch offsets found 
should be recalculated again and now point 
to relative addresses. However, some modules 
do not provide relocation information which 
makes dealing with such cases hard and prob- 
ably slow. 
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CALL DWORD PTR DS : [0 x4060 10 ] 

original memory at 0x406010 : 
(ptr to user32 . CreateWindowExA ) 
00406010 A9 E4 37 7E 

patched memory at 0x406010 : 
00406010 dd offset api_landl 



api 



.landl : 
user32 



cs:rel offset 



Listing 4: Example implementation of I AT redirec- 
tion. 



• Some Portable Executable modules like 
SHELL32.DLL are pretty large (8MB - 16MB). 
This causes some additional problems, since if 
a function callback address located in differ- 
ent module has a virtual address somewhere 
between this 0-X MB range and some instruc- 
tion will try to execute this virtual address 
the mechanism will fault. This is caused be- 
cause the function's callback virtual address is 
located within the limits of the current code 
segment, and therefore the GP fault does not 
occur. This is a major drawback since it will 
likely lead to an application crash. Potential 
workarounds for this issue would be to disallow 
(or reserve) the memory located at the 0-X MB 
range. However, this would require an inter- 
action with the system's Portable Executable 
loader. 

• As explained before, every time kernel returns 
control to the usermode code segment regis- 
ters are reinitialized with default values. Thus, 
the protection mechanism needs to re- initialize 
them as well each time such action happens. 

• Additionally the number of segment descrip- 
tors is limited however this is not a problem 
for most of the applications since the number 
of loaded modules is not high. 



Countermeasures The attacker would have to 
restore the original CS register value by, for exam- 
ple, returning into a RETF instruction. To protect 
against such attacks, firstly the current CS segment 
will be monitored at the crucial program places and 
secondly all the newly generated code segment se- 
lector values will be pseudo-randomized. 

4.3 Code Decoys 

This approach requires a processor with NX bit [5] 
support. The method itself is rather simple and it 
can be described in few steps: 

1. Mechanism setups a page fault filter and also 
module filter, which activates each time after 
a new module is mapped into process memory. 

2. All code sections from selected module found 
in the process memory are relocated to random 
memory address with the preservation of the 
section alignment (see [Figure 3[ ). 

3. After the relocation is done original code 
sections are marked as not-executable (see 
[Figure Sp . 

4. Each time a page fault occurs because of an 
execution attempt of not-executable memory, 
the filtering procedure decides if it should re- 
calculate the instruction pointer and continue 
the execution or to kill the process because of 
exploitation attempt. 



ORIGINAL MEMORY 

lO^RKED AS 
NOT-EXECUTABLE 



MIRRORED MODULES 

CODE WITH 
EXECUTABLE RIGHTS 



APP.EXE 



KERNEL32.DLL 



NTDLL.DLL 



KERNEL32 DLL 



NTDLL.DLL 



APP.EXE 



• Special care must be taken when dealing 
with original code hooks, since such cases 
exists in some of the applications (for ex- 
ample in IEXPLORE.EXE this is done by the 
IEFRAME.DLL module). 



Figure 3: Code decoys created from original mod- 
ules. 

A similar idea was also used by the PaX Team in 
RANDEXEC mechanism and also by Matt Miller 
in the WchnTrust project [l]. 
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To improve the performance of this mechanism, 
special care must be taken when deahng with case- 
switch offset tables — this was already mentioned 
in Section 14.21 Additional performance improve- 
ments can be achieved with import address table 
redirecting, not unlike the idea explained in Sec- 
tion 14.21 It is important, however, to point out 
that this idea can also lower the protection level of 
this mechanism. 

Countermeasures The attacker would have to 
guess or leak the mirrored code address. 
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6 Conclusion 

In this article, a number of promising techniques 
which can be used against the return-oriented pro- 
gramming attacks were presented. 

Most of the implementation problems of such 
mitigations are directly linked to a heavy perfor- 
mance impact. This is also a major factor in dis- 
couraging incorporation of these (and other) ROP 
mitigations into the selected platforms. Our secu- 
rity mitigations do not solve the problem of using 
return-oriented programming attacks completely, 
but they can effectively trammel and limit their 
usage. 
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